A multiple-time-scale comparative study for the added value of magnetic resonance imaging-based radiomics in predicting pathological complete response after neoadjuvant chemoradiotherapy in locally advanced rectal cancer

Objective Radiomics based on magnetic resonance imaging (MRI) shows potential for prediction of therapeutic effect to neoadjuvant chemoradiotherapy (nCRT) in locally advanced rectal cancer (LARC); however, thorough comparison between radiomics and traditional models is deficient. We aimed to construct multiple-time-scale (pretreatment, posttreatment, and combined) radiomic models to predict pathological complete response (pCR) and compare their utility to those of traditional clinical models. Methods In this research, 165 LARC patients undergoing nCRT followed by surgery were enrolled retrospectively, which were divided into training and testing sets in the ratio of 7:3. Morphological features on pre- and posttreatment MRI, coupled with clinical data, were evaluated by univariable and multivariable logistic regression analysis for constructing clinical models. Radiomic parameters were derived from pre- and posttreatment T2- and diffusion-weighted images to develop the radiomic signatures. The clinical-radiomics models were then generated. All the models were developed in the training set and then tested in the testing set, the performance of which was assessed using the area under the receiver operating characteristic curve (AUC). Radiomic models were compared with the clinical models with the DeLong test. Results One hundred and sixty-five patients (median age, 55 years; age interquartile range, 47–62 years; 116 males) were enrolled in the study. The pretreatment maximum tumor length, posttreatment maximum tumor length, and magnetic resonance tumor regression grade were selected as independent predictors for pCR in the clinical models. In the testing set, the pre- and posttreatment and combined clinical models generated AUCs of 0.625, 0.842, and 0.842 for predicting pCR, respectively. The MRI-based radiomic models performed reasonably well in predicting pCR, but neither the pure radiomic signatures (AUCs, 0.734, 0.817, and 0.801 for the pre- and posttreatment and combined radiomic signatures, respectively) nor the clinical-radiomics models (AUCs, 0.734, 0.860, and 0.801 for the pre- and posttreatment and combined clinical-radiomics models, respectively) showed significant added value compared with the clinical models (all P > 0.05). Conclusion The MRI-based radiomic models exhibited no definite added value compared with the clinical models for predicting pCR in LARC. Radiomic models can serve as ancillary tools for tailoring adequate treatment strategies.


Introduction
Neoadjuvant chemoradiotherapy (nCRT) coupled afterward with surgical resection has been standardly applied in locally advanced rectal cancer (LARC).For LARCs treated by nCRT, approximately 15-27% can obtain pathological complete response (pCR) (1).Concerned with the high operation-related morbidity (surgical complications and bowel and urogenital system dysfunction) and profound lifestyle alteration subsequent to surgery (2,3), investigators have proposed less invasive or alternative procedures, like a "watch-and-wait" regime or local resection (4,5) for patients with good response to nCRT.To implement these less invasive approaches safely and efficaciously, precise stratification of patients with pCR is a crucial step.
Magnetic resonance imaging (MRI) has been generally recognized as the standard imaging procedure in the primary evaluation and re-staging of rectal cancer (6,7).Several MRI characteristics, including tumor volume, signal intensity, and magnetic resonance tumor regression grade (mrTRG) (8)(9)(10), have been investigated to predict pCR.However, no consensus exists for any reliable and reproducible methods for accurate prediction before operation.The mrTRG, proposed by the MERCURY research team (11), was demonstrated in several studies (10,12,13) to have a significant association with pCR, while a recent meta-analysis reported that mrTRG exhibited superior specificity (93.5%) for pCR, but inferior sensitivity (32.3%) (14).
Radiomics, which provides non-visual information in relation to tumor heterogeneity by extracting many quantitative parameters from digital imaging, has recently been applied to predict treatment response in rectal cancer.A few studies have shown potential results for predicting pCR in LARC using MRI-based texture or radiomic parameters, but substantial limitations have emerged, including the use of single-timepoint models (15)(16)(17), single-sequence radiomics analysis (16), and a lack of independent validation (18).Thus, multiple-timepoint models based on multiparametric MRI are required to generalize the definite value of MRI-based radiomics for pCR assessment, in order to promote radiomics into a more practical perspective.
Therefore, our study aimed to develop and validate radiomic models based on multiple MRI timepoints (T2-and diffusionweighted imaging, T2WI and DWI), and to compare the value of radiomic models in predicting pCR in LARC with traditional clinical models.

Patients
Our research received approval from the institutional ethics committee, accompanied by a waiver for patients' informed consent due to the retrospective nature of this study.We reviewed consecutive patients who underwent rectal MRI scanning from Epidemiological parameters and levels of tumor markers were derived from the electronic medical database at our institute, including age, sex, pre-and posttreatment carbohydrate antigen 19-9 (CA19-9), and carcinoembryonic antigen (CEA) levels.

Neoadjuvant chemoradiotherapy and surgery
All patients received long-or short-course nCRT before surgery.Long-course nCRT was administered as radiotherapy of 45-50.4Gy to the whole pelvis (5 times per week for 5 weeks) and synchronous chemotherapy (825 mg/m 2 capecitabine orally, twice a day).Short-course nCRT was administered as radiotherapy of 25 Gy in total, with a fraction of 5 Gy and four cycles of chemotherapy after 7-14 days from completion of radiotherapy (130 mg/m 2 oxaliplatin intravenously, once a day, on day 1, as well as 1000 mg/m 2 oral capecitabine, twice a day, during day 1-14).All nCRT was followed by surgical resection, including abdominal-perineal resection, low anterior resection, and Hartmann's operation.

Histopathological assessment
Each surgery specimen was evaluated by a pathologist with 21 years' diagnostic experience in gastrointestinal pathology, abiding by the 8th edition of the American Joint Committee on Cancer's (AJCC) TNM staging system (19), blinded to imaging data.No residual tumors found in the primary tumor bed and lymph nodes were defined as pCR (ypT0N0).

MRI parameters and imaging acquisition
MRI scanning was conducted using Discovery MR 750 (GE Healthcare, Chicago, IL), a 3.0-T MRI system with a phased-array surface coil.Raceanisodamine hydrochloride (10 mg) was intramuscularly injected in patients before scanning to suppress bowel motility (except for those with contraindications).Additionally, ultrasound transmission gel (50-60 mL) was injected into the enteric cavity through rectal intubation to highlight the tumor boundary and enhance contrast.Axial T1weighted imaging (T1WI); axial fat-saturated T2WI (T2WI/FS); axial DWI; and oblique axial, coronal, and sagittal T2WI of two timepoints (pre-and posttreatment) were obtained.The detailed parameters regarding the MRI sequences are presented in Supplementary Table S1.

MRI morphological evaluation
MRI morphological evaluation, including the parameters of distance from tumor to anal verge (DTA), maximum tumor length (MTL), maximum tumor thickness (MTT), circumferential percentage (CP), mrT (ymrT), mrN (ymrN), mesorectal fascia (MRF), extramural vascular invasion (EMVI), and mrTRG was conducted on pre-and posttreatment MR images.These were evaluated by a radiologist with 21 years' diagnostic experience in gastrointestinal imaging, who was only aware of the pathological results proven by biopsy.
DTA was measured on sagittal T2WI from the anal verge to the tumor's lowest margin.MTL was recorded as the maximum longitudinal extent from the tumor's upper to lower margins on sagittal T2WI.MTT and CP were assessed on oblique axial T2WI with maximum tumor dimension.MTT was recorded by the perpendicular distance between the tumor extension's outer margin and the rectal wall and CP, the tumor invasion's proportion around the rectal wall with four degrees (degree 1, 0-0.25; 2, > 0.25-0.5;3, > 0.5-0.75, and 4, > 0.75-1).
The mrT (ymrT) and mrN (ymrN) staging originated from the 8th edition of the AJCC staging system (19).Metastatic lymph node in the primary evaluation and re-staging after nCRT was determined according to the consensus recommended by the European Society of Gastrointestinal and Abdominal Radiology (ESGAR) (7).MRF invasion was defined as the distance equal to or smaller than 1 mm from tumor spiculae to MRF (7).EMVI evaluation was conducted based on a five-point scoring system (20).mrTRGs were assigned in accordance with the description by the MERCURY study group (11): mrTRG1, complete regression, the primary tumor bed without residual tumor signal; mrTRG2, dense low signal fibrosis with minimal tumor signal; mrTRG3, substantial tumor signal; mrTRG4, small areas of fibrosis outgrown by residual tumor; and mrTRG5, extensive residual cancer with no regression or tumor growth.The mrTRG1 was considered pCR, whereas mrTRGs2-5 were considered non-pCR.

Imaging segmentation and radiomic feature extraction
The delineation of regions of interest (ROIs) and radiomic feature extraction were performed on the pre-and posttreatment oblique axial T2WI and axial DWI by using Radcloud version 3.1.0,which was based on the "pyradiomics" package within Python version 3.8.1.Reader 1 (a junior radiologist with three years' diagnostic experience in gastrointestinal imaging) and reader 2 (a senior radiologist with 16 years' diagnostic experience in gastrointestinal imaging) conducted a review of each imaging set to reach a consensus over the ROIs.Reader 1 first drew manually on each consecutive tumor-containing slice, which showed intermediate T2WI and high DWI signal in contrast with the normal signal of the muscular layer of the adjacent rectal wall.In some patients, tumor signals were not identified on posttreatment images, and these ROIs were positioned at the location of the tumor bed before treatment (21). Figure 1 shows two examples of segmentation of ROIs on the pre-and posttreatment images.Reader 2 then examined these ROIs.The two readers would discuss to reach a consensus if there was a discrepancy.They were unaware of the pathological results and clinical data.Imaging normalization weighting coupled with resampling for voxel size (1×1×1 mm 3 ) was conducted.Radiomic feature extraction was followed by an automatic procedure.The types of features are listed in Supplementary Table S2.There were 1,409 parameters extracted from each modality and 5636 parameters in total were extracted for each patient.

Radiomic feature selection and signature construction
Z-scores were used to normalize the radiomic features, which aimed at averting the influence of different feature magnitudes.Irrelevant or redundant features were eliminated and 30 parameters with high relevance and low redundancy were reserved using maximum relevance minimum redundancy (mRMR) arithmetic.The performance of 10-fold cross-validation in the training set was calculated and the optimal subset of features was identified using the least absolute shrinkage and selection operator (LASSO) by comparing the results.The values of the tuning parameters (l) were then determined.Each patient's pre-and posttreatment radscores were calculated using a weighted linear combination of these selected predictors.The pre-and posttreatment radscores, regarded as two independent radiomic signatures, were compared by Mann-Whitney U test between pCR and non-pCR to explore the significance, respectively.The combined radiomic signature was generated by integrating the pre-and posttreatment radscores using multivariable logistic regression (selection method, Backward: LR).

Statistical analysis
We employed R (version 4.1.1,R Foundation, Vienna, Austria) and IBM SPSS Statistics (version 25.0, Chicago, IL) to conduct the statistical analyses.Clinical characteristics including demographic data, levels of tumor markers, and MRI morphological parameters were analyzed.The Shapiro-Wilk test was performed for normality assessment.The difference in continuous normally distributed variables was analyzed using the independent t-test between pCR and non-pCR groups, whereas continuous non-normally distributed variables were analyzed using the Mann-Whitney U test.Categorical data were evaluated with the c 2 test or Fisher's exact test.Two-sided P-values < 0.05 were considered statistically significant.
The clinical variables were assessed by univariable logistic regression analysis of pCR and non-pCR to explore the significance.The significant variables were then analyzed by multivariable logistic regression (selection method, Backward: LR) to identify the independent predictors to construct the pretreatment, posttreatment, and combined clinical models.The clinical-radiomics models were established in the same way, except for adding the pre-and posttreatment radscores as independent radiomic signatures.All models were established based on the Two examples of segmentation of ROIs on the pre-and posttreatment images.Panels training group and validated by the testing group.The utility of models was evaluated using the areas under the receiver operating characteristic curves (AUCs).Bootstrapping was used to generate 95% confidence intervals (CIs).The DeLong test was conducted to compare the AUCs between models.The overall workflow of the comparative study is presented in Figure 2.

Radiomic feature selection and signature construction
A subset of three pretreatment and eight posttreatment radiomic parameters was confirmed separately as the optimal candidate predictor for the radiomic signatures.Detailed information regarding the contributing weight of the selected radiomic features is shown in Supplementary Figure S1.Both the pre-and posttreatment radscores of pCR patients in the training set were larger than those of non-pCR patients (median: pretreatment radscore, -1.11 vs. -2.22,P = 0.011; posttreatment radscore, 0.42 vs. -1.50,P < 0.001).These were verified by the testing cohort (median: pretreatment radscore, -0.07 vs. -2.39,P = 0.415; posttreatment radscore, 0.32 vs. -1.32,P = 0.014).Figure 3 shows box plots of the pre-and posttreatment radscores in the training and testing sets for the pCR and non-pCR groups.The pre-and posttreatment radscores were enrolled in the combined radiomic signature as independent predictors.The pretreatment, posttreatment, and combined radiomic signatures exhibited AUCs of 0.775-0.887for the training group and 0.734-0.817for the testing group.
In three clinical-radiomics models, the characteristics enrolled as independent predictors were separate as follows: the pretreatment radscore for the pretreatment clinical-radiomics model; the posttreatment MTL (OR, 0.929; 95% CI, 0.862-1) and posttreatment radscore (OR, 2.236; 95% CI, 1.471-3.400)for the The overall workflow of the comparative study.CA19-9, carbohydrate antigen 19-9; CEA, carcinoembryonic antigen; CP, circumferential percentage; DTA, distance from tumor to anal verge; EMVI, extramural vascular invasion; MRF, mesorectal fascia; MRI, magnetic resonance imaging; mrTRG, magnetic resonance tumor regression grade; MTL, maximum tumor length; MTT, maximum tumor thickness; pCR, pathological complete response.3 and Figure 4 show the AUCs of the models.The combined radiomic signature, as well as the combined clinical-radiomics model, achieved the highest AUC in the training group (AUC, 0.887; 95% CI, 0.815-0.958),while it was the posttreatment clinical-radiomics model (AUC, 0.860; 95% CI, 0.751-0.970) in the testing group.Furthermore, comparisons of MRI-based radiomic models and clinical models showed that neither the pure radiomic signatures nor the clinical-radiomics models of three timepoints were significantly different from the clinical models (all P > 0.05), both in the training and testing sets.Also, there were no significant differences when comparing the pure radiomic signatures with clinical-radiomics models of three timepoints (all P > 0.05).Table 4 illustrates the comparison between the models.

Discussion
The precise stratification of LARC patients with pCR after nCRT has become a crucial issue, because they can consider less invasive procedures, like a "watch-and-wait" regime or local resection.MRI-based radiomics shows potential for predicting pCR, but the thorough comparison between radiomics and traditional models is deficient.In this study, we developed and validated multiple time-scale (pretreatment, posttreatment, and combined) radiomic models based on MRI to predict pCR and compared their utility to those of traditional clinical models.Radiomic models performed reasonably well for predicting pCR in LARC.However, neither the pure radiomic signatures nor the  clinical-radiomics models of three timepoints showed a definite added value to the traditional clinical models.The pretreatment radiomic signature generated moderate AUCs of 0.734-0.775to predict pCR, which were concordant with the results of previous studies (18,22,23).Meanwhile, the posttreatment and combined radiomic signatures obtained higher AUCs (0.801-0.887) than the pretreatment one.The superiority of posttreatment imaging corresponded well with the former  Frontiers in Oncology frontiersin.orginvestigation (24), which is theoretically directly linked to pathological results.To scrutinize the pure radiomic signatures, consistent with prior studies (25), high weights of high-order radiomic features were included in the models, with 3/3 of the pretreatment and 4/8 of the posttreatment features being the wavelet features, which reflect the change rate of the pixel value in the frequency domain (26), representing the complexity and heterogeneity of tumors and can better predict pCR.
Clinical models performed inferiorly to well in our research.The pretreatment clinical model based on the single MTL got the lowest prediction performance (AUC, 0.625) in the testing set, which indicated the predicting insufficiency of pure pretreatment morphological features.But even so, the pretreatment radiomic models (including both the pure radiomic signature and clinicalradiomics model) didn't perform significantly superior to the clinical model.The posttreatment clinical model generated higher AUCs than  the pretreatment one and the combined clinical model only reserved the posttreatment clinical features as the independent predictors after multivariable logistic regression selection with the method of Backward, which implied the superior predicting utility of posttreatment clinical features.Both the posttreatment and combined clinical models were based on the posttreatment MTL and mrTRG.Posttreatment MTL has been recognized as an effective morphological predictor in assessing pCR as former studies reported (24, 27, 28).Another promising predictor was mrTRG, which reflected the tumor signal status after treatment and highly correlated with the tumor response (10,29).In the comparison of the posttreatment radiomic models (including both the pure radiomic signature and clinical-radiomics model) with the clinical model, there was still no added significant value that emerged, which was the same as the combined models.Our results were consistent with a handful of previous reports.Shi et al. reported that a pretreatment radiomic model could predict pCR but showed no significant difference from the clinical model.However, the conclusion lacked independent validation (18).Bulens et al. illustrated in an external validation cohort that neither the pure radiomic model based on pre-and posttreatment MRI nor the clinical-radiomics model outperformed the clinical model in predicting (near-)pCR.However, the study did not conduct further stratification research by timepoint (30).
The strength of our study resides in our multiple-timepoint and multiple-modality comparative analysis in the field of radiomics, which is few in the current research.Recent studies employing radiomics to predict pCR in LARC have been an exponential growth, while few of them clarified the usefulness of MRI-based radiomic models, especially compared with the traditional clinical ones.Our study promotes radiomics into a more applicable perspective and gets the conclusions with general applicability and realistic instruction.Since the ambiguous superiority over the clinical models and the laborious and intricate process during radiomic analyzing, the application of radiomic modeling is far from routine in the clinical practice.Clinical parameters, including the emerging ones, such as histopathological, immunohistochemical, and genetic, still deserve further investigation.
It is worth noting that radiomics is not devoid of any advantages.Considering the realistic diagnostic procedures in the clinical practice that mrTRG can vary in doctors with different experience and the final diagnosis are always concluded by the senior one in a two-observer review, we took mrTRG results evaluated by a senior radiologist into the analysis.Consistent with the former literature (specificity, 92-98%; sensitivity, 0-59%) (14), mrTRG in our study obtained moderate to high specificity (0.791-0.865) and low sensitivity (0.571-0.579).In contrast, expert input cannot be a requirement in radiomics analysis.It was reported (31) that significant predictive performance can be achieved regardless of whether radiomics ROI segmentation was done by an experienced radiologist or a junior resident.And our radiomic ROI segmentation was conducted by a junior one.In this case, radiomics can serve as a supplementary tool in senior-absent situations to add confidence in treatment response assessment and help tailor the treatment strategies adequately.
There were several limitations in this research.First, it was a retrospective analysis with a limited scale of datasets in a single institute, which might carry inherent selection bias.Prospective and multicenter external validation deserves further investigation in the future.Second, many factors can potentially affect the reproducibility of radiomic features, such as scanning sequence, data acquisition, image preprocessing, segmentation strategy, and feature extraction tools.In this study, we performed several measures (eg ROIs were delineated by two radiologists in consensus and the whole-volume segmentation) to improve feature reproducibility.Therefore, we did not evaluate the interobserver and intraobserver reproducibility of segmentation.Third, other functional sequences such as dynamic contrast-enhanced MRI (DCE-MRI) and apparent diffusion coefficient (ADC) maps were not enrolled in our study.DCE-MRI is not routinely applied in the rectal MRI examination; ADC maps are vulnerable, with geometric distortion and sensitivity to susceptibility artifacts.
In conclusion, our study showed that MRI-based radiomic models performed reasonably well for the prediction of pCR in LARC, but exhibited no definite added value compared to the traditional clinical models.Radiomic models can serve as ancillary tools for selecting candidate pCR patients and tailoring adequate treatment strategies.
(A-D) show the segmentation of a 51-year-old patient with low-rectum adenocarcinoma at a stage of cT3N1M0.(A, B) Pretreatment oblique axial T2WI and axial DWI (b = 1000 s/mm 2 ); (C, D) Posttreatment oblique axial T2WI and axial DWI (b = 1000 s/mm 2 ); this patient was demonstrated pCR by surgical pathology.Panels (E-H) show the segmentation of a 60-year-old patient with low-rectum adenocarcinoma at a stage of cT3N1M0.(E, F) Pretreatment oblique axial T2WI and axial DWI (b = 1000 s/ mm 2 ); (G, H) Posttreatment oblique axial T2WI and axial DWI (b = 1000 s/mm 2 ); this patient was demonstrated non-pCR by surgical pathology.DWI, diffusion-weighted imaging; pCR, pathological complete response; ROI, region of interest; T2WI, T2-weighted imaging.

3
FIGURE 3Box plots for the pre-and posttreatment radscores of the pCR and non-pCR groups.Panels (A, B) show the pretreatment radscores in the training and testing sets, respectively.Panels (C, D) show the posttreatment radscores in the training and testing sets, respectively.pCR, pathological complete response.

4
FIGURE 4The receiver operating characteristic curves for the clinical, radiomics, and clinical-radiomics models of three timepoints.(A-C) Curves for the pretreatment, posttreatment, and combined models in the training set, respectively; (D-F) Curves for the pretreatment, posttreatment, and combined models in the testing set, respectively.(Curves for the pretreatment and combined clinical-radiomics models coincide with those of the pretreatment and combined radiomic signatures, respectively, which are displayed in red).

TABLE 1
Patients' clinical characteristics in the training and testing sets.
Age is expressed as median with interquartile range in parentheses; other measurements are expressed as numbers of patients with percentages in parentheses.CA19-9, carbohydrate antigen 19-9; CEA, carcinoembryonic antigen.a : Mann-Whitney U test.b : c 2 test.c : Fisher's exact test.

TABLE 2
Pre-and posttreatment MRI morphological characteristics of patients in the training and testing sets.

TABLE 2 Continued
Pre-and posttreatment DTA, MTL, and MTT are expressed as median with interquartile range in parentheses; other measurements are expressed as numbers of patients with percentages in parentheses.CP, circumferential percentage; DTA, distance from tumor to anal verge; EMVI, extramural vascular invasion; MRF, mesorectal fascia; MRI, magnetic resonance imaging; mrTRG, magnetic resonance tumor regression grade; MTL, maximum tumor length; MTT, maximum tumor thickness.
a : Mann-Whitney U test.b : c 2 test.c : Fisher's exact test.

TABLE 3
The areas under the receiver operating characteristic curves of multiple-time-scale models.